-
-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i.cca: use 0 based array indexing #3239
base: main
Are you sure you want to change the base?
Conversation
When ccmath library was introduced in 49b8066 most of arrays were moved to 0 based indexing but some locations got missed. Most likely it is the source of a crash reported a few years a go: https://trac.osgeo.org/grass/ticket/2297
Although this PR fixes segfault, the code does not produce correct output (in comparison to pre- 49b8066 version). Unless the testcase passes, this PR does not improve on the current state. |
imagery/i.cca/main.c
Outdated
cov[i][j][k] = cov[i][k][j] = sigs.sig[i - 1].var[j - 1][k - 1]; | ||
for (j = 0; j < bands; j++) { | ||
mu[i][j] = sigs.sig[i].mean[j]; | ||
for (k = 0; k < j; k++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try
for (k = 0; k < j; k++) { | |
for (k = 0; k <= j; k++) { |
because j is not the upper bound of k to be excluded but to be included, also with zero-based indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting my mistake!
Still even with this being fixed, output is not identical to the old version. Should it be identical? Suggestions how to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions in stats.c assume zero-based indices, thus I would expect different results in the sense that results with your PR (zero based indices) are correct whereas previous results (one based indices) were not correct, particularly because the indexing to cov[][][]
did not change.
You would need some independent, trusted alternative for Canonical components analysis in order to decide which results are correct. Regarding the code, your PR seems like a bugfix, previous results are probably not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarification, cov
has been indexed previously in main.c
as cov[i][j][k]
which was wrong for one-based i, j, k because zero-based i, j, k are used in stats.c
. With your PR, cov
is consistently indexed with zero-based i, j, k, that could explain the differences.
Testing this PR I see that the component numbers in the warning differ from the generated output:
|
cls.runModule("g.region", raster="lsat7_2000_20") | ||
cls.group_name = tempname(10) | ||
cls.subgroup_name = "vis" | ||
cls.runModule( | ||
"i.group", | ||
group=cls.group_name, | ||
subgroup=cls.subgroup_name, | ||
input="lsat7_2000_20,lsat7_2000_30,lsat7_2000_40", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cls.runModule("g.region", raster="lsat7_2000_20") | |
cls.group_name = tempname(10) | |
cls.subgroup_name = "vis" | |
cls.runModule( | |
"i.group", | |
group=cls.group_name, | |
subgroup=cls.subgroup_name, | |
input="lsat7_2000_20,lsat7_2000_30,lsat7_2000_40", | |
) | |
cls.runModule("g.region", raster="lsat7_2002_20") | |
cls.group_name = tempname(10) | |
cls.subgroup_name = "vis" | |
cls.runModule( | |
"i.group", | |
group=cls.group_name, | |
subgroup=cls.subgroup_name, | |
input="lsat7_2002_20,lsat7_2002_30,lsat7_2002_40", | |
) |
There is an CI error: "b'ERROR: Raster map <lsat7_2000_20> not found\n'" - the available Landsat datasets are of 2002. Updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably affect the expected results, @marisn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markus, I have had no time (and am not certain if my math level is up to task) to go over formulas from the original paper (or was it a technical document?) to create a synthetic dataset with known results as I did for the r.kappa module. This is the reason why this PR has not been merged as at the moment is possible only to test if module runs but not if it produces correct output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, funny understood.
So I'd suggest to commit the data name change and see what happens next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it went further, till here:
Running ./imagery/i.cca/testsuite/test_i_cca.py...
========================================================================
F
======================================================================
FAIL: test_output_values (__main__.OutputMatchTest.test_output_values)
Test correctness of i.cca output
----------------------------------------------------------------------
Traceback (most recent call last):
File "imagery/i.cca/testsuite/test_i_cca.py", line 83, in test_output_values
self.assertRasterMinMax(
File "etc/python/grass/gunittest/case.py", line 539, in assertRasterMinMax
self.fail(self._formatMessage(msg, stdmsg))
AssertionError: Wrong calculated value
The actual maximum (14) is greater than the reference one (-3) for raster map tmp_oy5WA6oFO0.2 (with minimum -17)
Just for fun, here an attempt for a synthetic Minimalistic i.cca Example with Synthetic DataSet a simple region for your synthetic raster maps, creating a 10x10 grid with a resolution of 10 units: g.region n=100 s=0 e=100 w=0 res=10 -p Generate synthetic raster maps by using # map1: Random values between 0 and 100
r.mapcalc "map1 = rand(0, 100)" -s
# map2: X-coordinate values with added noise
r.mapcalc "map2 = x() + rand(0, 10)" -s
map3: Y-coordinate values with added noise
r.mapcalc "map3 = y() + rand(0, 10)" -s
r.info -r map=map1
r.info -r map=map2
r.info -r map=map3
d.mon wx0
d.rast map1
d.rast map2
d.rast map3 Create an imagery group and subgroup with these synthetic raster maps: i.group group=synthetic_group subgroup=synthetic_subgroup \
input=map1,map2,map3 Create synthetic training data by generating a simple vector map with labeled polygons for training. This creates:
v.random output=training_points n=5 seed=1
v.db.addtable map=training_points columns="class INTEGER"
v.db.update map=training_points column=class value=cat
v.db.select map=training_points
v.buffer input=training_points output=training_areas distance=20 -t
v.db.select map=training_areas
v.db.update map=training_areas column=class value=1 where="cat <= 2"
v.db.update map=training_areas column=class value=2 where="cat == 3"
v.db.update map=training_areas column=class value=3 where="cat > 3"
v.db.select map=training_areas
d.vect training_areas -c Rasterize the training areas: v.to.rast input=training_areas output=training_areas use=attr attribute_column=class
r.info map=training_areas -r
r.category map=training_areas
d.erase
d.rast training_areas Next generate a signature file from the synthetic training data: i.gensig trainingmap=training_areas group=synthetic_group subgroup=synthetic_subgroup \
signaturefile=synthetic_signatures Perform Canonical Components Analysis with i.cca: i.cca group=synthetic_group subgroup=synthetic_subgroup \
signature=synthetic_signatures output=cca_comp Display the results: d.rast map=cca_comp.1
d.rast map=cca_comp.2
d.rast map=cca_comp.3 Check statistics of the output raster maps: r.univar map=cca_comp.1
r.univar map=cca_comp.2
r.univar map=cca_comp.3
d.erase
d.rgb b=cca_comp.1 g=cca_comp.2 r=cca_comp.3 |
Markus, this still only tests if module runs and not if module runs correctly. The proper solution is to open http://dbwww.essc.psu.edu/lasdoc/user/canal.html, come up with some numbers, plug them into provided formulas and get results. Then they can be used to create a test case that validates GRASS code output. |
When ccmath library was introduced in 49b8066 most of arrays were moved to 0 based indexing but some locations got missed. Most likely it is the source of a crash reported a few years a go: https://trac.osgeo.org/grass/ticket/2297
With this fix I'm able to run the code without a segfault, but I haven't figured out how to get any output and thus no tests are provided.